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Given a constrained minimization problem, under what conditions does there exist a related, un- 
constrained problem having the same minimum points? This basic question in global optimization 
motivates this paper, which answers it from the viewpoint of statistical mechanics. In this context, it 
reduces to the fundamental question of the equivalence and nonequivalence of ensembles, which is 
analyzed using the theory of large deviations and the theory of convex functions. 

In a 2000 paper appearing in the Journal of Statistical Physics, we gave necessary and suffi- 
cient conditions for ensemble equivalence and nonequivalence in terms of support and concavity 
properties of the microcanonical entropy. In later research we significantly extended those results 
by introducing a class of Gaussian ensembles, which are obtained from the canonical ensemble by 
adding an exponential factor involving a quadratic function of the Hamiltonian. The present paper 
is an overview of our work on this topic. Our most important discovery is that even when the mi- 
crocanonical and canonical ensembles are not equivalent, one can often find a Gaussian ensemble 
that satisfies a strong form of equivalence with the microcanonical ensemble known as universal 
equivalence. When translated back into optimization theory, this implies that an unconstrained min- 
imization problem involving a Lagrange multiplier and a quadratic penalty function has the same 
minimum points as the original constrained problem. 

The results on ensemble equivalence discussed in this paper are illustrated in the context of the 
Curie-Weiss-Potts lattice-spin model. 
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I. INTRODUCTION 

At the beginning of his groundbreaking 1973 paper, Oscar Lanford describes the underlying program of 
statistical mechanics issi p. 1]. 

The objective of statistical mechanics is to explain the macroscopic properties of matter on 
the basis of the behavior of the atoms and molecules of which it is composed. One of the 
most striking facts about macroscopic matter is that in spite of being fantastically complicated 
on the atomic level — to specify the positions and velocities of all molecules in a glass of 
water would mean specifying something of the order of 10^^ parameters — its macroscopic 
behavior is describable in terms of a very small number of parameters; e.g., the temperature 
and density for a system containing only one kind of molecule. 

Lanford shows how the theory of large deviations enables this objective to be realized. In statistical 
mechanics one determines the macroscopic behavior of physical systems not from the deterministic laws of 
Newtonian mechanics, but from a probability distribution that expresses both the behavior of the system on 
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the microscopic level and the intrinsic inability to describe precisely what is happening on that level. Using 
the theory of large deviations, one shows that, with probability converging to 1 exponentially fast as the 
number of particles tends to oo, the macroscopic behavior is describable in terms of a very small number of 
parameters 

The success of this program depends on the correct choice of probability distribution, also known as 
an ensemble. One starts with a prior measure on configuration space, which, as an expression of the lack 
of information concerning the behavior of the system on the atomic level, is often taken to be the uniform 
measure. The most natural choice of ensemble is the microcanonical ensemble, obtained by conditioning 
the prior measure on the set of configurations for which the Hamiltonian per particle equals a constant 
energy u. Gibbs introduced a mathematically more tractable probability distribution known as the Gibbs 
ensemble or the canonical ensemble, in which the conditioning that defines the microcanonical ensemble 
is replaced by an exponential factor involving the Hamiltonian and the inverse temperature /3, a parameter 
dual to the energy parameter u il'^ . 

Among other reasons, the canonical ensemble was introduced by Gibbs in the hope that in the limit 
n —^ CO the two ensembles are equivalent; i.e., all macroscopic properties of the model obtained via the 
microcanonical ensemble could be realized as macroscopic properties obtained via the canonical ensemble. 
While ensemble equivalence is valid for many standard and important models, ensemble equivalence does 
not hold in general, as numerous studies cited later in this introduction show. There are many examples 
of statistical mechanical models for which nonequivalence of ensembles holds over a wide range of model 
parameters and for which physically interesting microcanonical equilibria are often omitted by the canonical 
ensemble. 

The present paper is an overview of our work on this topic. One of the beautiful aspects of the theory 
is that it elucidates a fundamental issue in global optimization, which in fact motivated our work on the 
Gaussian ensemble. Given a constrained minimization problem, under what conditions does there exist a 
related, unconstrained minimization problem having the same minimum points? 

In order to explain the connection between ensemble equivalence and global optimization and in order to 
outline the contributions of this paper, we introduce some notation. Let be a space, / a function mapping 
X into [0, oo], and H a function mapping X into M. For u £ R we consider the following constrained 
minimization problem: 



A partial answer to the question posed at the end of the preceding paragraph can be found by introducing 
the following related, unconstrained minimization problem for /? € M: 



The theory of Lagrange multipliers outlines suitable conditions under which the solutions of the constrained 
problem (11.11) lie among the critical points of / + /3H. However, it does not give, as we will do in Theorems 
I3.1l and l331 necessary and sufficient conditions for the solutions of (ll.lt to coincide with the solutions of 
the unconstrained minimization problem (ll.2t and with the solutions of the unconstrained minimization 
problem appearing in (II. 5t . 

We denote by S"^ and Sf^ the respective sets of solutions of the minimization problems ([l.lt and (ll.2t . 
These problems arise in a natural way in the context of equilibrium statistical mechanics |21], where u 
denotes the energy and /3 the inverse temperature. As we will outline in Section 2, the theory of large devia- 
tions allows one to identify the solutions of these problems as the respective sets of equilibrium macrostates 
for the microcanonical ensemble and the canonical ensemble. 

The paper [21] analyzes equivalence of ensembles in terms of relationships between <5" and f^. In turn, 
these relationships are expressed in terms of support and concavity properties of the microcanonical entropy 



minimize I{x) over x G X subject to the contraint H{x) = u. 



(1.1) 



minimize I{x) + (5H{x) over x € . 



(1.2) 




inf{/(a;) : x G X,H{x) = u}. 



(1.3) 
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The main results in jllll are summarized in Theorem 13.11 Part (a) of that theorem states that if s has a 
strictly supporting line at an energy value u, then full equivalence of ensembles holds in the sense that there 
exists a /3 such that S"^ = £p. In particular, if s is strictly concave on dom s, then s has a strictly supporting 
line at all u except possibly boundary points [Thm. I3.2r a)1 and thus full equivalence of ensembles holds at 
all such u. In this case we say that the microcanonical and canonical ensembles are universally equivalent. 

The most surprising result, given in part (c), is that if s does not have a supporting line at u, then 
nonequi valence of ensembles holds in the strong sense that n (J/j = for all /? G M'^. That is, if s does 
not have a supporting line ntu — equivalently, if s is not concave at n — then microcanonical equilibrium 
macrostates cannot be realized canonically. This is to be contrasted with part (d), which states that for any 
a; G there exists u such that x G <S"; i.e., canonical equilibrium macrostates can always be realized 
microcanonically. Thus of the two ensembles the microcanonical is the richer. 

The paper 1 12] addresses the natural question suggested by part (c) of Theorem 13.11 If the micro- 
canonical ensemble is not equivalent with the canonical ensemble on a subset of energy values u, then is 
it possible to replace the canonical ensemble with another ensemble that is universally equivalent with the 
microcanonical ensemble? We answered this question by introducing a penalty function ^[H{x) — n]^ into 
the unconstrained minimization problem (II. 2I) . obtaining the following: 

minimize I{x) + j3H{x) + ^[H{x) — over x ^ X. (1.4) 

Since for each x ^ X 

it is plausible that for all sufficiently large 7 minimum points of the penalized problem (ll.4t are also mini- 
mum points of the constrained problem (ll.lt . Since (i can be adjusted, (II. 4t is equivalent to the following: 

minimize I{x) + PH{x) + 'j[H{x)]'^ over x e X. (1.5) 

The theory of large deviations allows one to identify the solution of this problem as the set of equilibrium 
macrostates for the so-called Gaussian ensemble. It is obtained from the canonical ensemble by adding an 
exponential factor involving 7/1^, where /i„ denotes the Hamiltonian energy per particle. The utility of the 
Gaussian ensemble rests on the simplicity with which the quadratic function ^u^ defining this ensemble 
enters the formulation of ensemble equivalence. Essentially all the results in concerning ensemble 
equivalence, including Theorem 13.11 generalize to the setting of the Gaussian ensemble by replacing the 
microcanonical entropy s(n) by the generalized microcanonical entropy 

s-y(u) = s{u) — 7U^. (1.6) 

The generalization of Theorem 13. H is stated in Theorem 13. 3 1 which gives all possible relationships between 
the set f " of equilibrium macrostates for the microcanonical ensemble and the set Sf^^y of equilibrium 
macrostates for the Gaussian ensemble. These relationships are expressed in terms of support and concavity 
properties of Sy. 

For the purpose of applications the most important consequence of Theorem 13.31 is given in part (a), 
which states that if has a strictly supporting line at an energy value u, then full equivalence of ensembles 
holds in the sense that there exists a /3 such that £^ = Sp^-y- In particular, if is strictly concave on dom s, 
then Sy has a strictly supporting line at all u except possibly boundary points [Thm. I3.4f a)1 and thus full 
equivalence of ensembles holds at all such u. In this case we say that the microcanonical and Gaussian 
ensembles are universally equivalent. 
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In the case in which s is and s" is bounded above on the interior of dom s, then the strict concavity 
of is easy to show. In fact, the strict concavity is a consequence of 

s^(n) = s"{u) - 27 < for all u G int(dom s), 

and this in turn is valid for all sufficiently large 7 rThm. l42ll . For such 7 it follows, therefore, that the 
microcanonical and Gaussian ensembles are universally equivalent. 

Defined in (12.61) . the Gaussian ensemble is mathematically much more tractable than the microcanonical 
ensemble, which is defined in terms of conditioning. The simpler form of the Gaussian ensemble is reflected 
in the simpler form of the unconstrained minimization problem (II. 5t defining the set £f^^^ of Gaussian 
equilibrium macrostates. In dl.St the constraint appearing in the minimization problem dl.ll) defining the 
set <S" of microcanonical equilibrium macrostates is replaced by the linear and quadratic terms involving 
H{x). The virtue of the Gaussian formulation should be clear. When the microcanonical and Gaussian 
ensembles are universally equivalent, then from a numerical point of view, it is better to use the Gaussian 
ensemble because this ensemble, contrary to the microcanonical one, does not involve an equality constraint, 
which is difficult to implement numerically. Furthermore, within the context of the Gaussian ensemble, it 
is possible to use Monte Carlo techniques without any constraint on the sampling |8, 9]. 

By giving necessary and sufficient conditions for the equivalence of the three ensembles in Theorems 
I3.1l and l331 we make contact with the duality theory of global optimization and the method of augmented 
Lagrangians |3, §2.2], ^45', §6.4]. In the context of global optimization the primal function and the dual 
function play the same roles that the microcanonical entropy (resp., generalized microcanonical entropy) 
and the canonical free energy (resp., Gaussian free energy) play in statistical mechanics. Similarly, the 
replacement of the Lagrangian by the augmented Lagrangian in global optimization is paralleled by our 
replacement of the canonical ensemble by the Gaussian ensemble. 

The Gaussian ensemble is a special case of the generalized canonical ensemble, which is obtained from 
the canonical ensemble by adding an exponential factor involving g{hn), where (7 is a continuous function 
that is bounded below. Our paper flill gives all possible relationships between the sets of equilibrium 
macrostates for the microcanonical and generalized canonical ensembles in terms of support and concavity 
properties of an appropriate entropy function. Our paper |54] shows that the generalized canonical ensemble 
can be used to transform metastable or unstable nonequilibrium macrostates for the standard canonical 
ensemble into stable equilibrium macrostates for the generalized canonical ensemble. 

Equivalence and nonequivalence of ensembles is the subject of a large literature. An overview is given in 
the introduction of |42]. A number of theoretical papers on this topic, including 1 16. 2L 26, 28. 41. 42. 50J. 
investigate equivalence of ensembles using the theory of large deviations. In |4l|, §7] and 1421 §7.3] there 
is a discussion of nonequivalence of ensembles for the simplest mean-field model in statistical mechanics; 
namely, the Curie-Weiss model of a ferromagnet. However, despite the mathematical sophistication of these 
and other studies, none of them except for our papers 1.12. .2 1.] explicitly addresses the general issue of the 
nonequivalence of ensembles. 

Nonequivalence of ensembles has been observed in a wide range of systems that involve long-range 
interactions and that can be studied by the methods of I2III . In all of these cases the microcanonical 



formulation gives rise to a richer set of equilibrium macrostates. For example, it has been shown computa- 
tionally that the strongly reversing zonal-jet structures on Jupiter as well as the Great Red Spot fall into the 
nonequivalent range of an appropriate microcanonical ensemble Isill . Other models for which ensemble 
nonequivalence has been observed include a number of long-range, mean-field spin models including the 
Hamiltonian mean-field model 1.14. .39.] . the mean-field X-Y model L15il . and the mean-field Blume-Emery- 
Griffith model For a mean-field version of the Potts model called the Curie-Weiss-Potts model, 

equivalence and nonequivalence of ensembles is analyzed in detail in llinLlll| . Ensemble nonequivalence 
has also been observed in models of turbulent vorticity dynamics l'A ll7Lll8ll22L 2^ 3^ isll . models of plas- 
mas [32,i^> gravitational systems I2fle,i2i<.42« J3], and a model of the Lennard- Jones gas fs]. A detailed 
discussion of ensemble nonequivalence for models of coherent structures in two dimensional turbulence is 



5 



given in J21I §1.4]. 

Gaussian ensembles were introduced in fs?] and studied further in I^l9l l33ll34ll52l l. As these papers dis- 
cuss, an important feature of Gaussian ensembles is that they allow one to account for ensemble-dependent 
effects in finite systems. Although not referred to by name, the Gaussian ensemble also plays a key role 
in isdll . where it is used to address equivalence-of-ensemble questions for a point- vortex model of fluid 
turbulence. 

Another seed out of which the research summarized in the present paper germinated is the paper j^^l. 
There we study the equivalence of the microcanonical and canonical ensembles for statistical equilibrium 
models of coherent structures in two-dimensional and quasi-geostrophic turbulence. Numerical computa- 
tions demonstrate that, as in other cases, nonequivalence of ensembles occurs over a wide range of model 
parameters and that physically interesting microcanonical equilibria are often omitted by the canonical en- 
semble. In addition, in Section 5 of 12211 . we establish the nonlinear stability of the steady mean flows 
corresponding to microcanonical equilibria via a new Lyapunov argument. The associated stability theo- 
rem refines the well-known Arnold stability theorems, which do not apply when the microcanonical and 
canonical ensembles are not equivalent. The Lyapunov functional appearing in this new stability theorem is 
defined in terms of a generalized thermodynamic potential similar in form to I{x) + PH{x)+j[H{x)]'^, the 
minimum points of which define the set of equilibrium macrostates for the Gaussian ensemble [see (12.141) 1. 

Our goal in this paper is to give an overview of our theoretical work on ensemble equivalence presented 
in fl2, 21]. The paper |13] investigates the physical principles underlying this theory. In Section 2 of the 
present paper, we first state the hypotheses on the statistical mechanical models to which the theory of the 
present paper applies. We then define the three ensembles — microcanonical, canonical, and Gaussian 
— and specify the three associated sets of equilibrium macrostates in terms of large deviation principles. 
In Section 3 we state two sets of results on ensemble equivalence. The first involves the equivalence of 
the microcanonical and canonical ensembles, necessary and sufficient conditions for which are given in 
terms of support properties of the microcanonical entropy s defined in (II. 3t . The second involves the 
equivalence of the microcanonical and Gaussian ensembles, necessary and sufficient conditions for which 
are given in terms of support properties of the generalized microcanonical entropy defined in (11.61) . 
Section 4 addresses a basic foundational issue in statistical mechanics. There we show that when the 
canonical ensemble is nonequivalent to the microcanonical ensemble on a subset of energy values u, it can 
often be replaced by a Gaussian ensemble that is universally equivalent to the microcanonical ensemble. 
In Section 5 the results on ensemble equivalence discussed in this paper are illustrated in the context of 
the Curie-Weiss-Potts lattice-spin model, a mean-field approximation to the nearest-neighbor Potts model. 
Several of the results presented near the end of this section are new. 

II. DEFINITIONS OF MODELS AND ENSEMBLES 

One of the objectives of this paper is to show that when the canonical ensemble is nonequivalent to the 
microcanonical ensemble on a subset of energy values u, it can often be replaced by a Gaussian ensemble 
that is equivalent to the microcanonical ensemble for all u. Before introducing the various ensembles as 
well as the methodology for proving this result, we first specify the class of statistical mechanical models 
under consideration. The models aie defined in terms of the following quantities. 

1. A sequence of probability spaces {Q,n, ^n, Pn) indexed by n € N, which typically represents a 
sequence of finite dimensional systems. The il„ are the configuration spaces, G rj„ are the 
microstates, and the f„ are the prior measures on the a fields JF„. 

2. A sequence of positive scaling constant a„ — > 00 as n ^ 00. In general a„ equals the total number 
of degrees of freedom in the model. In many cases a„ equals the number of particles. 
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3. For each n G N a measurable functions if„ mapping Jl„ into K. For w G r2„ we define the energy 
per degree of freedom by 

hn{uj) = —Hn{uj). 
an 

Typically, i/„ in item 3 equals the Hamiltonian, which is associated with energy conservation in the model. 
The theory is easily generalized by replacing Hn by a vector of appropriate functions representing additional 
dynamical invariants associated with the model P12L 12111 . 

A large deviation analysis of the general model is possible provided that there exist a space of 
macrostates, macroscopic variables, and an interaction representation function and provided that the macro- 
scopic variables satisfy the large deviation principle (LDP) on the space of macrostates. These concepts are 
explained next. 

4. Space of macrostates. This is a complete, separable metric space X, which represents the set of all 
possible macrostates. 

5. Macroscopic variables. These are a sequence of random variables Yn mapping Qn into X. These 
functions associate a macrostate in X with each microstate lo ^ Q^- 

6. Interaction representation function. This is a bounded, continuous functions H mapping X into 
M such that as n — > oo 

hn (i^) = H{Yn{uj)) + o{l) uniformly for o; G On; (2.1) 

i.e., 

lim sup \hn{uj) — H{Yn{uj))\ = 0. 

The function H enable us to write either exactly or asymptotically, as a function of the macrostate 
via the macroscopic variables Yn. 

7. LDP for the macroscopic variables. There exists a function I mapping X into [0, oo] and having 
compact level sets such that with respect to P„ the sequence Yn satisfies the LDP on X with rate 
function / and scaling constants a„. In other words, for any closed subset F of X 

lim sup — log Pn{Yn G F} < — inf I{x), 

n-^oo an x£F 

and for any open subset G of ^ 

liminf — logP„{y„ G G} > - inf I{x). 

n^oo an 

It is helpful to summarize the LDP by the formal notation Pn{Yn G dx} x exp[— a„/(x)]. This 
notation expresses the fact that, to a first degree of approximation, Pn{Yn G dx} behaves like an 
exponential that decays to whenever I{x) > 0. 

A wide variety of statistical mechanical models satisfy the hypotheses listed in items 1-7 at the start of 
this section and so can be studied by the methods of 112112111 . These include the following. 
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1 . The mean-field Blume-Emery-Griffiths model |4|] is one of the simplest lattice-spin models known 
to exhibit, in the mean-field approximation, both a continuous, second-order phase transition and 
a discontinuous, first-order phase transition. The space of macrostates for this model is the set of 
probability measures on a certain finite set, the macroscopic variables are the empirical measures 
associated with the spin configurations, and the associated LDP is Sanov's Theorern, for which the 
rate function is a relative entropy. Various features of this model are studied in IlLIZ 124112511 . 

2. The Curie-Weiss-Potts model is a mean-field approximation to the nearest-neighbor Potts model 
IstIi . For the Curie-Weiss-Potts model, the space of macrostates, the macroscopic variables, and 
the associated LDP are similar to those in the mean-field Blume-Emery-Griffiths model. The Curie- 
Weiss-Potts model nicely illustrates the general results on ensemble equivalence discussed in this 
paper and is discussed in SectionfVl 

3. Short-range spin systems such as the Ising model on TL'^ and numerous generalizations can also be 
handled by the methods of this paper. The large deviation techniques required to analyze these 
models are much more subtle than in the case of the long-range, mean-field models considered in 
items 1 and 2. For the Ising model the space of macrostates is the space of translation-invariant 
probability measures on Z'^, the macroscopic variables are the empirical processes associated with 
the spin configurations, and the rate function in the associated LDP is the mean relative entropy 



4. The Miller-Robert model is a model of coherent structures in an ideal, two-dimensional fluid that 
includes all the exact invariants of the vorticity transport equation .The space of macrostates 
is the space of Young measures on the vorticity field. The large deviation analysis of this model 
developed first in |47] and more recently in |6] gives a rigorous derivation of maximum entropy 
principles governing the equilibrium behavior of the ideal fluid. 

5. In geophysical applications, another version of the model in item 4 is preferred, in which the en- 
strophy integrals are treated canonically and the energy and circulation are treated microcanonically 
iiill . In those formulations, the space of macrostates is L^(A) or L°°(A) depending on the contraints 
on the voriticty field. The large deviation analysis is carried out in [20]. The paper [22] shows how 
the nonlinear stability of the steady mean flows arising as equilibrium macrostates can be established 
by utilizing the appropriate generalized thermodynamic potentials. 

6. A statistical equilibrium model of solitary wave structures in dispersive wave turbulence governed 
by a nonlinear Schrodinger equation is studied in |23]. The large deviation analysis given in I23il 
derives rigorously the concentration phenomenon observed in long-time numerical simulations and 
predicted by mean-field approximations lH^Hj^. The space of macrostates is i^(A), where A is a 
bounded interval or more generally a bounded domain in M.'^. The macroscopic variables are certain 
Gaussian processes. 

We now return to the general theory, first introducing the function whose support and concavity prop- 
erties completely determine all aspects of ensemble equivalence and nonequivalence. This function is the 
microcanonical entropy, defined for n G M by 

s{u) = - inf{/(x) -.xeX, H{x) = u}. (2.2) 

Since / maps X into [0, oo], s maps M'^ into [— oo,0]. Moreover, since / is lower semicontinuous and 
H is continuous on X, s is upper semicontinuous on M*^. We define doms to be the set of n € M'^ for 
which s(u) > — oo. In general, dom s is nonempty since —s is a rate function 1^ Prop. 3.1(a)]. For each 
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u G dom s, r > 0, n G N, and set B G Tn the microcanonical ensemble is defined to be the conditioned 
measure 

Py{B} = Pn{B \hne[u-r,u + r]}. (2.3) 

As shown in i2ll p. 1027], if u G doms, then for all sufficiently large n, Pn{hn G [ti — r, u + r]} > 0; 
thus the conditioned measures Pn^ are well defined. 

A mathematically more tractable probability measure is the canonical ensemble. For each n G N, /3 G M, 
and set B G J^n we define the partition function 

ZniP) = eX.p[-anPhn]dPn, 

which is well defined and finite, and the probability measure 

PnAB} = i^- I exp[-an(3K]dPn. (2.4) 

^n(P) Jb 

The measures i-*„ ,g are Gibbs states that define the canonical ensemble for the given model. 

The Gaussian ensemble is a natural perturbation of the canonical ensemble. For each n G N, /3 G M, 
and 7 G [0, oo) we define the Gaussian partition function 

Z„(/?,7)=/ ex.p[-anPK - anjhl]dPn. (2.5) 

This is well defined and finite because the /i„ aie bounded. For B G J^n we also define the probability 
measure 

Pn,t^,'y{B} = r • / exp[-an/3/ln " OnT^n] dPn, (2.6) 

which we call the Gaussian canonical ensemble. One can generalize this by replacing the quadratic function 
by a continuous function g that is bounded below. This gives rise to the generalized canonical ensemble, 
which the theory developed in fl2ll allows one to treat. 

Using the theory of large deviations, one introduces the sets of equilibrium macrostates for each ensem- 
ble. It is proved in Thm. 3.2] that with respect to the microcanonical ensemble P„' ,Yn satisfies the 
LDP on X, in the double limit n ^ oo and r ^ 0, with rate function 

Ju^^^ ^ f Hx) + s{u) if H{x) = u ^2 7) 

\ oo otherwise . 

is nonnegative on X, and for u G dom s, /" attains its infimum of on the set 

£^ = {xeX : = 0} (2.8) 

= {x £ A! : I{x) is minimized subject to H{x) = u}. 

This set is precisely the set of solutions of the constrained minimization problem (ll.lt . 

In order to state the LDPs for the other two ensembles, we bring in the canonical free energy, defined 
for /3 G M by 

ifiP) = - lim — logZ„(/5), 
and the Gaussian free energy, defined for /5 G M and 7 > by 

fif^n) = - lim — logZ„(/?,7). 

n^oo On 
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It is proved in 12 iL Thm. 2.4] that the limit defining exists and is given by 

^iP)= mf{I{y)+PH{y)} (2.9) 

and that with respect to Pn,i3, Yn satisfies the LDP on X with rate function 

Ip{x)=I{x)+(3H{x)-^{P). (2.10) 

Ifj is nonnegative on A! and attains its infimum of on the set 

8/3 = {x&X : Ip{x) = 0} (2.11) 
= {x G X : I{x) + {P,H{x)) is minimized}. 

This set is precisely the set of solutions of the unconstrained minimization problem (11.21) . 

A straightforward extension of these results shows that the limit defining (p{P, 7) exists and is given by 

7) = inf {/(y) + PH{y) + 7[H{y)f} (2.12) 

and that with respect to Pn,f3,g, Yn satisfies the LDP on X with rate function 

= I{x) + pH{x) + 7[^(x)]2 - ^{(3, 7). (2.13) 

1/3 is nonnegative on X and attains its infimum of on the set 

£p,^ = {xeX: Ip^^ix) = 0} (2.14) 
= {x £ X : I{x) + {(3, H{x)) + ^[H{x)f is minimized}. 

This set is precisely the set of solutions of the penalized minimization problem (II. 5t . 
For u G dom s, let x be any element of X satisfying /"(x) > 0. The formal notation 



Pli^'iYn G dx} X e 



-an/"(a;) 



suggests that x has an exponentially small probability of being observed in the limit n — > 00, r — > 0. Hence 
it makes sense to identify f " with the set of microcanonical equilibrium macrostates. In the same way we 
identify with £p the set of canonical equilibrium macrostates and with £p^^ the set of generalized canonical 
equilibrium macrostates. A rigorous justification is given in i2]l Thm. 2.4(d)]. 

III. EQUIVALENCE AND NONEQUIVALENCE OF THE THREE ENSEMBLES 

Having defined the sets of equilibrium macrostates £"", £p, and £p^.y for the microcanonical, canonical 
and Gaussian ensembles, we now show how these sets are related to one another. In Theorem 13. II we state 
the results proved in concerning equivalence and nonequivalence of the microcanonical and canonical 
ensembles. Then in Theorem 13. 3 1 we extend these results to the Gaussian ensemble L12.1 . 

Parts (a)-(c) of Theorem 13. II give necessary and sufficient conditions, in terms of support properties of 
s, for equivalence and nonequivalence of 8"^ and £(j. These assertions are proved in Theorems 4.4 and 4.8 
in lEHl . Part (a) states that s has a strictly supporting line at u if and only if full equivalence of ensembles 
holds; i.e., if and only if there exists a /? such that E"^ = £p. The most surprising result, given in part (c), 
is that s has no supporting line at u if and only if nonequivalence of ensembles holds in the strong sense 
that f " n = for all /?. Part (c) is to be contrasted with part (d), which states that for any /? canonical 
equilibrium macrostates can always be realized microcanonically. Part (d) is proved in Theorem 4.6 in 
M. Thus one conclusion of this theorem is that at the level of equilibrium macrostates the microcanonical 
ensemble is the richer of the two ensembles. 
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Theorem 3.1. In parts (a), (b), and (c), u denotes any point in dom s. 

(a) Full equivalence. There exists fi such that £^ = £p if and only if s has a strictly supporting line at 
u with slope (3; i.e., 

s{v) < s(n) + I3{v — u)for all v ^ u . 

(b) Partial equivalence. There exists [3 such that C f/j but ^ £p if and only if s has a nonstrictly 
supporting line at u with slope (3; i.e., 

s{v) < s{u) + (3{v — u)for all v with equality for some v ^ u. 

(c) Nonequivalence. For all /?,£"" H f/j = if and only if s has no supporting line at u; i.e., 

for all (3 there exists v such that s{v) > s{u) + P{v — u). 

(d) Canonical is always realized microcanonically. For any /3 £Rwe have H{£p) C dom s and 

£p= [j £-. 

We highlight several features of the theorem in order to illuminate their physical content. In part (a) we 
assume that for a given u e dom s there exists a unique f3 such that = £f^. If s is differentiable at u and 
s and the double-Legendre-Fenchel transform s** are equal in a neighborhood of u, then /? is given by the 
standard thermodynamic formula /3 = s'{u) [12, Thm. A.4(b)]. The inverse relationship can be obtained 
from part (d) of the theorem under the assumption that £f3 consists of a unique macrostate or more generally 
that for all x G £p the values H{x) are equal. Then £f^ = £^^f^\ where u{(3) = H{x) for any x € £^^; 
u{f3) denotes the mean energy realized at equilibrium in the canonical ensemble. The relationship u = u{(3) 
inverts the relationship (3 = s'{u). Partial ensemble equivalence can be seen in part (d) under the assumption 
that for a given (3, £p can be partitioned into at least two sets £p^i such that for all x G j the values H{x) 
are equal but H{x) ^ H{y) whenever x € <?/3,i and y S <?/3j- for i ^ j. Then f"/? = |J.£'"*(^\ where 
Ui{P) = H{x), X € j. Clearly, for each i, £'"»(^) c f/j but ^ £p. Physically, this corresponds to 

a situation of coexisting phases that normally takes place at a first-order phase transition [55] . 

Before continuing with our analysis of ensemble equivalence, we make a number of basic definitions. A 
function / on M is said to be concave on M if / maps M into M U {— oo}, / ^ — oo, and for all u and u in M 
and all A G (0, 1) 

/(An + (1 - X)v) > A/(n) + (1 - X)f{v). 

Let / ^ — oo be a function mapping R into M U {— oo}. We define dom / to be the set of u for which 
f{u) > — oo. For P and n in M the Legendre-Fenchel transforms /* and /** are defined by 

f*{(3) = inU{(3,u) - f{u)} and /**(n) = inf - /*(/?)}. 

The function /* is concave and upper semicontinuous on M and for all u we have f**{u) = f{u) if 
and only if / is concave and upper semicontinuous on M 1 19, Thm. VI.5.3]. When / is not concave and 
upper semicontinuous, then /** is the smallest concave, upper semicontinuous function on M that satisfies 
f**{u) > f{u) for all n E Prop. A.2]. In particular, if for some u, f{u) / /**(n), then f{u) < f**{u). 

Let / ^ — oo be a function mapping E into M U {— oo}, u a point in dom/, and K a convex subset 
of dom/. We have the following four additional definitions: / is concave at u if f{u) = /**(n); / is not 
concave at u if f{u) < f**{u); f is concave on K if f is concave at all u G K; and / is strictly concave on 
K if for all n / i; in K and all A G (0, 1) 

/(An + (1 - \)v) > Xfiu) + (1 - X)fiv). 
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We also introduce two sets that play a central role in the theory. Let / be a concave function on R whose 
domain is an interval having nonempty interior. For u € M the superdifferential of / at u, denoted by df{u), 
is defined to be the set of (3 such that f3 is the slope of a supporting line of / at u. Any such f3 is called a 
supergradient of / at u. Thus, if / is differentiable at u € int(dom/), then df{u) consists of the unique 
point (5 = f'{u). If / is not differentiable at n € int(dom/), then Aomdf consists of all (5 satisfying the 
inequalities 

(/')+w</?<(/'r(^), 

where {f')^{u) and {f')^{u) denote the left-hand and right-hand derivatives of / at u. The domain of df, 
denoted by dom df, is then defined to be the set of u for which df{u) 7^ 0. 

Complications arise because dom df can be a proper subset of dom /, as simple examples clearly show. 
Let 6 be a boundary point of dom / for which f{b) > — 00. Then b is in dom df if and only if the one- 
sided derivative of / at 6 is finite. For example, if 6 is a left hand boundary point of dom / and (/')^(^) 
is finite, then df{b) = [{f')'^{b), 00); any /? G df{b) is the slope of a supporting line at b. The possible 
discrepancy between dom df and dom / introduces unavoidable technicalities in the statements of several 
results concerning the existence of supporting lines. 

One of our goals is to find concavity and support conditions on the microcanonical entropy guaranteeing 
that the microcanonical and canonical ensembles are fully equivalent at all points u G dom s except possibly 
boundary points. If this is the case, then we say that the ensembles are universally equivalent. Here is a 
basic result in that direction. The universal equivalence stated in part (b) follows from part (a) and from part 
(a) of Theorem l3.ll The rest of the theorem depends on facts concerning concave functions LI 2, p. 1305]. 

Theorem 3.2. Assume that dom s is an interval having nonempty interior and that s is strictly concave on 
int(dom s) and continuous on dom s. The following conclusions hold. 

(a) s has a strictly supporting line at all u € dom s except possibly boundary points. 

(b) The microcanonical and canonical ensembles are universally equivalent; i.e., fully equivalent at all 
u G dom s except possibly boundary points. 

(c) s is concave on R, and for each u in part (b) the corresponding /? in the statement of full equivalence 
is any element ofds{u). 

(d) If s is differentiable at some u G dom s, then the corresponding (3 in part (b) is unique and is given 
by the standard thermodynamic formula [3 = s'{u). 

The next theorem extends Theorem 13. ll bv giving equivalence and nonequivalence results involving 
and £"0,7, the sets of equilibrium macrostates with respect to the microcanonical and Gaussian ensembles. 
The chief innovation is that s(n) in Theorem l3.1l is replaced here by the generalized microcanonical entropy 
s{u) — 7n^. As we point out after the statement of Theorem 13.31 for the purpose of applications part 
(a) is its most important contribution. The usefulness of Theorem 13.31 is matched by the simplicity with 
which it follows from Theorem 13.11 Theorem 13.31 is a special case of Theorem 3.4 in h2ll . obtained by 
specializing the generalized canonical ensemble and the associated set of equilibrium macrostates to the 
Gaussian ensemble and the set £13^^ of Gaussian equilibrium macrostates. 

Theorem 3.3. Given 7 > 0, define s^(n) = s(n) — ju^. In parts (a), (b), and (c), u denotes any point in 
dom s. 

(a) Full equivalence. There exists (3 such that = Ep^^ if and only if has a strictly supporting line 
at u with slope (3. 

(b) Partial equivalence. There exists (3 such that C E-p^-y but ^ £fj^^ if and only if s-y has a 
nonstrictly supporting line at u with slope (3. 

(c) Nonequivalence. For all (3, 8^ n Sfs^^ = if and only if s-y has no supporting line at u. 
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(d) Gaussian is always realized microcanonically. For any (3 we have H{£i3^^) c dom s and 

^/3,7 = U 

Proof. For 7 > and B ^ Tn^^ define a new probability measure 

Pn,^{B} = -J f exp[-a„7/i^] 

With respect to Pn,^, Yn satisfies the LDP on X with rate function 

^{x)=I{x)+^\H{x)f-i,{^), 

where ^(7) = uiiy^x{I{y) + l[H{yy\^}. Replacing the prior measure P„ in the canonical ensemble with 
-Pn,7 gives the Gaussian ensemble Pn,p,'Y, which has 6/3^^^ as the associated set of equilibrium macrostates. 
On the other hand, replacing the prior measure P„ in the microcanonical ensemble with gives 

= Pn,^{B \hne[u-r,u + r]}, 

By continuity, for lo satisfying hn{Lo) G [u — r,u + r], converges to uniformly in uj and n 

as r — > 0. It follows that with respect to Pn,7, Yn satisfies the LDP on Af, in the double limit n 00 
and r 0, with the same rate function /" as in the LDP for Yn with respect to Pn'^ ■ As a result, the set 
of equilibrium macrostates corresponding to Pn^j coincides with the set of microcanonical equilibrium 
macrostates. 

It follows from parts (a)-(c) of Theorem 13. II that all equivalence and nonequivalence relationships be- 
tween £^ and are expressed in terms of support properties of the function Sj obtained from s by 
replacing the rate function / by the new rate function 1^. The function is given by 

Sj{u) = — inf{/^(3;) : x E X,H{x) = u} 

= - inf{/(x) + -fH{xf :xeX, H{x) = u] + ^^(7) 
= — 7ti^ + ^(7). 

Since s^{u) differs from s^(ti) = s{u) — '^v? by the constant we conclude that all equivalence and 

nonequivalence relationships between and £p^^ are expressed in terms of the same support properties 
of s-^. This completes the derivation of parts (a)-(c) of Theorem 13.31 from parts (a)-(c) of Theorem 13.11 
Similarly, part (d) of Theorem l3.3l follows from part (d) of Theorem 13. II ■ 

The importance of part (a) of Theorem l3.3l in applications is emphasized by the following theorem, which 
will be applied in the sequel. This theorem is the analogue of Theorem 13.21 for the Gaussian ensemble, s 
in that theorem being replaced by s^. The functions s and have the same domains. The universal 
equivalence stated in part (b) of the next theorem follows from part (a) and from part (a) of Theorem 13. 3 1 

Theorem 3.4. For 7 > 0, define s^{u) = s{u) — ju^. Assume that doms is an interval having nonempty 
interior and that is strictly concave on int(dom s) and continuous on dom s. The following conclusions 
hold. 

(a) s-y has a strictly supporting line at all u £ dom s except possibly boundary points. 

(b) The microcanonical ensemble and the Gaussian ensemble defined in terms of this 7 are universally 
equivalent; i.e., fully equivalent at all u € dom s except possibly boundary points. 

(c) s-y is concave on R, and for each u in part (b) the corresponding (3 in the statement of full equivalence 
is any element ofds-ylu). 

(d) If s-y is differentiable at some u G dom s, then the corresponding (3 in part (b) is unique and is given 
by the thermodynamic formula (5 = s'{u). 
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The most important repercussion of Theorem 13.41 is the ease with which one can prove that the micro- 
canonical and Gaussian ensembles are universally equivalent in those cases in which the microcanonical 
and canonical ensembles are not fully or partially equivalent. This rests mainly on part (b) of Theorem l3.4l 
which states that universal equivalence of ensembles holds if there exists a 7 > such that is strictly 
concave on int(doms). The existence of such a 7 follows from a natural set of hypotheses on s stated in 
Theorem l4.2l in the next section. 

IV. UNIVERSAL EQUIVALENCE VIA THE GENERALIZED CANONICAL ENSEMBLE 

This section addresses a basic foundational issue in statistical mechanics. Under the assumption that the 
microcanonical entropy is and s" is bounded above, we show in Theorem I4.2l that when the canonical 
ensemble is nonequivalent to the microcanonical ensemble on a subset of energy values u, it can often be 
replaced by a Gaussian ensemble that is univerally equivalent to the microcanonical ensemble; i.e., fully 
equivalent at all u G dom s except possibly boundary points. Theorem l4.3l is a weaker version that can often 
be applied when s" is not bounded above. In the last section of the paper, these results will be illustrated in 
the context of the Curie-Weiss-Potts lattice-spin model. 

In Theorem l4.2l the strategy is to find a quadratic function ju^ such that s^{u) = s{u) — ^v? is strictly 
concave on int(dom s) and continuous on dom s. Parts (a) and (b) of Theorem 13.41 then yields the universal 
equivalence. As the next proposition shows, an advantage of working with quadratic functions is that 
support properties of involving a supporting line are equivalent to support properties of s involving a 
supporting parabola defined in terms of 7. This observation gives a geometrically intuitive way to find a 
quadratic function guaranteeing universal ensemble equivalence. 

In order to state the proposition, we need a definition. Let / be a function mapping E into M U {— cxd}, 
u and /3 points in M, and 7 > 0. We say that / has a supporting parabola at u with parameters 7) if 

/(t-) </(«) + (/?, v-n)+7(t; -n)2 for all v. (4.1) 

The parabola is said to be strictly supporting if the inequality is strict for all w 7^ u. 

Proposition 4.1. / has a (strictly) supporting parabola at u with parameters {(3, 7) if and only if f — 7(-)^ 
has a (strictly) supporting line at u with slope (3. The quantities (3 and (3 are related by (3 = (3 — 2ju. 

Proof. The proof is based on the identity {v — u)"^ = v"^ — 2u{v — u) — v?. If / has a strictly supporting 
parabola at u with parameters (/?, 7), then for allv^u 

f{v) - 7^^ < f{u) - 7u^ + f3{v - u), 

where (3 = (3 — 2ju. Thus / — 7(-)^ has a strictly supporting line at u with slope (3. The converse is proved 
similarly, as is the case in which the supporting line or parabola is supporting but not strictly supporting. ■ 

The first application of Theorem 13 .41 is Theorem l4.2l which gives a criterion guaranteeing the existence 
of a quadratic function ju^ such that Sy{u) = s{u) — ju^ is strictly concave on dom s. The criterion — that 
s" is bounded above on the interior of dom s — is essentially optimal for the existence of a fixed quadratic 
function "fu^ guaranteeing the strict concavity of s^. The situation in which s" is not bounded above on the 
interior of dom s can often be handled by Theorem 14. 3 1 which is a local version of Theorem 14. 21 

Theorem 4.2. Assume that dom s is an interval having nonempty interior. Assume also that s is continuous 
on dom s, s is twice continuously differentiable on int(dom s), and s" is bounded above on int(dom s). Then 
for all sufficiently large 7 > 0, conclusions (a)-(c) hold. Specifically, if s is strictly concave on dom s, then 
we choose any 7 > 0, and otherwise we choose 

7 > 70 = i . sup s"{u). (4.2) 

JiGint(dom s) 
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(a) s-y{u) = s{u) — ^v? is strictly concave and continuous on dom s. 

(b) has a strictly supporting line, and s has a strictly supporting parabola, at all u € dom s ex- 
cept possibly boundary points. At a boundary point has a strictly supporting line, and s has a strictly 
supporting parabola, if and only if the one-sided derivative of s-y is finite at that boundary point. 

(c) The microcanonical ensemble and the Gaussian ensemble defined in terms of this 7 are universally 
equivalent; i.e., fully equivalent at all u € dom s except possibly boundary points. For all u E int(dom s) the 
value of (3 defining the universally equivalent Gaussian ensemble is unique and is given by [5 = s'{u) — 2ju. 

Proof, (a) If s is stricdy concave on dom s, then Sy is also strictly concave on this set for any 7 > 0. We 
now consider the case in which s is not strictly concave on dom s. For any 7 > 0, is continuous on 
dom s. If, in addition, we choose 7 > 70 in accordance with (I4.2t . then for all u E int(dom s) 

4(n) = s"{u) - 27 < 0. 

A straightforward extension of the proof of Theorem 4.4 in J^, in which the inequalities in the first two 
displays are replaced by strict inequalities, shows that —s^ is strictly convex on int(dom s) and thus that 
Sj is strictly concave on int(dom s). If Sy is not strictly concave on dom s, then must be affine on an 
interval. Since this violates the strict concavity on int(dom s), part (a) is proved. 

(b) The first assertion follows from part (a) of the present theorem, part (a) of Theorem l3.4l and Propo- 
sition 14.11 Concerning the second assertion about boundary points, the reader is referred to the discussion 
before Theorem l3.2l 

(c) The universal equivalence of the two ensembles is a consequence of part (a) of the present theorem 
and part (b) of Theorem l3.4l The full equivalence of the ensembles at all u E int(dom s) is equivalent to the 
existence of a strictly supporting line at each u E int(dom s) rThm. l33l a)1. Since Sy{u) is differentiable at 
all u E int(dom s), for each u the slope of the strictly supporting line at u is unique and equals Sy{u) 1I12L 
Thm. A. 1(b)]. ■ 

Suppose that s is on the interior of dom s but the second-order partial derivatives of s are not bounded 
above. This arises, for example, in the Curie-Weiss-Potts model, in which dom s is a closed, bounded 
interval of M and s"{u) — > cx) as u approaches the right hand endpoint of dom s [see In such cases one 
cannot expect that the conclusions of Theorems 14.21 will be satisfied; in particular, that there exists 7 > 
such that Sj{u) = s{u) — 7m^ has a strictly supporting line at each point of the interior of dom s and thus 
that the ensembles are universally equivalent. 

In order to overcome this difficulty, we introduce Theorem l4.31 a local version of Theorem l4.2l Theorem 
14.31 handles the case in which s is on an open set K but either K is not all of int(doms) or K = 
int(dom s) and the second-order partial derivatives of s are not all bounded above on K. In neither of these 
situations are the hypotheses of Theorem l4.2l satisfied. 

In Theorem 14 . 3 1 other hypotheses are given guaranteeing that for each u ^ K there exists 7 such that 
Sy has a strictly supporting line at -u; in general, 7 depends on u. However, with the same 7, might also 
have a strictly supporting line at other values of u. In general, as one increases 7, the set of u at which 
has a strictly supporting line cannot decrease. Because of part (a) of Theorem 13.31 this can be restated in 
terms of ensemble equivalence involving the set of Gaussian equilibrium macrostates. Defining 

Fj = {u^K: there exists /3 such that Sp^y = £^}, 

we have Fy-^ C Fy^ whenever 72 > 71 and because of Theorem 14. 31 IJ7>o ^7 ~ "^^^^ phenomenon is 
investigated in Section|3for the Curie-Weiss-Potts model. 

In order to state Theorem 14. 3 1 we define forn E K and A > 

D{u, s'{u), A) = E doms : s{v) > s{u) + s'{u){v — u) + X{v — u)^} . 
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Geometrically, this set contains all points for which the parabola with parameters {s'{u), A) passing through 
(n, s(n)) lies below the graph of s. Clearly, since A > 0, we have D{u, s'{u), A) C D{u, s'(n), 0); the set 
D{u, s'{u),0) contains all points for which the graph of the line with slope s'{u) passing through {u, s{u)) 
lies below the graph of s. Thus, in the next theorem the hypothesis that for each u £ K the set D{u, s'{u), A) 
is bounded for some A > is satisfied if doms is bounded or, more generally, if D{u, s'{u), 0) is bounded. 
The latter set is bounded if, for example, —s is superlinear; i.e., 

lim s(f)/|f| = — oo. 

d|^oo 

The quantity 70 (u) appearing in the next theorem is defined in equation (5.7) in 

Theorem 4.3. Let K an open subset o/dom s and assume that s is twice continuously differentiable on K. 
Assume also that dom s is bounded or, more generally, that for every u € int X there exists A > such that 
D{u, s'{u), A) is bounded. Then for each u £ K there exists 7o(it) > with the following properties. 

(a) For each u £ K and any 7 > 70 (^i), s has a strictly supporting parabola at u with parameters 

(b) For each u £ K and any 7 > 70 (^i), s-y = s — 7(-)^ has a strictly supporting line at u with slope 
s'{u) — 2ju. 

(c) For each u £ K and any 7 > 70 (u), the microcanonical ensemble and the Gaussian ensemble 
defined in terms of this 7 are fully equivalent at u. The value of (3 defining the Gaussian ensemble is unique 
and is given by 13 = s'{u) — 2ju. 

Comments on the Proof, (a) We first choose a parabola that is strictly supporting in a neighborhood of u 
and then adjust 7 so that the parabola becomes strictly supporting on all M. Proposition 14. 1 1 guarantees that 
s — 7(-)^ has a strictly supporting line at u. Details are given in fl2', pp. 1319-1321]. 

(b) This follows from part (a) of the present theorem and Proposition 14. II 

(c) For u G K the full equivalence of the ensembles follows from part (b) of the present theorem and 
part (a) of Theorem 1331 The value of /? defining the fully equivalent Gaussian ensemble is determined by a 
routine argument given in 1 1?, p. 1321]. ■ 

Theorem l4.3l suggests an extended form of the notion of universal equivalence of ensembles. In Theorem 
I4.2l we are able to achieve full equivalence of ensembles for all u G dom s except possibly boundary points 
by choosing an appropriate 7 that is valid for all u. This leads to the observation that the microcanonical 
ensemble and the Gaussian ensemble defined in terms of this 7 are universally equivalent. In Theorem 
14.31 we can also achieve full equivalence of ensembles for all u £ K. However, in contrast to Theorem 
14.21 the choice of 7 for which the two ensembles are fully equivalent depends on u. We summarize the 
ensemble equivalence property articulated in part (c) of Theorem I4.3l bv saying that relative to the set of 
quadratic functions, the microcanonical and Gaussian ensembles are universally equivalent on the open set 
K of energy values. 

We complete our discussion of the generalized canonical ensemble and its equivalence with the micro- 
canonical ensemble by noting that the smoothness hypothesis on s in Theorem 14. 3 1 is essentially satisfied 
whenever the microcanonical ensemble exhibits no phase transition at any u £ K. In order to see this, we 
recall that a point Uc at which s is not differentiable represents a first-order, microcanonical phase transition 
1E5L Fig. 3]. In addition, a point Uc at which s is differentiable but not twice differentiable represents a 
second-order, microcanonical phase transition Fig. 4]. It follows that s is smooth on any open set K 
not containing such phase-transition points. Hence, if the other conditions in Theorem l4.3l are valid, then 
the microcanonical and Gaussian ensembles are universally equivalent on K relative to the set of quadratic 
functions. In particular, if the microcanonical ensemble exhibits no phase transitions, then s is smooth 
on all of int(dom s). This implies the universal equivalence of the two ensembles provided that the other 
conditions are valid in Theorem l4.2l 

In the next section we apply the results in this paper to the Curie-Weiss-Potts model. 
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V. APPLICATIONS TO THE CURIE- WEISS-POTTS MODEL 

The Curie-Weiss-Potts model is a mean-field approximation to the nearest-neighbor Potts model, which 
takes its place next to the Ising model as one of the most versatile models in equilibrium statistical mechanics 

. Although the Curie-Weiss-Potts model is considerably simpler to analyze, it is an excellent model to 
illustrate the general theory presented in this paper, lying at the boundary of the set of models for which 
a complete analysis involving explicit formulas is available. As we will see, there exists an interval N 
such that for any u N the microcanonical ensembe is nonequivalent to the canonical ensemble. The 
main result, stated in Theorem 15. 21 is that for any u £ N there exists 7 > such that the microcanonical 
ensemble and the Gaussian ensemble defined in terms of this 7 are fully equivalent for all v < u. While 
not as strong as universal equivalence, the ensemble equivalence proved in Theorem 15.21 is considerably 
stronger than the local equivalence stated in Theorem l4.3l 

Let (7 > 3 be a fixed integer and define A = {6^, 9"^, . . . , 9'^}, where the 9^ are any q distinct vectors 
in M"^. In the definition of the Curie-Weiss-Potts model, the precise values of these vectors is immaterial. 
For each n € N the model is defined by spin random variables uji,uj2, ■ ■ ■ liOn that take values in A. The 
ensembles for the model are defined in terms of probability measures on the configuration spaces A", which 
consist of the microstates lu = (wi, 0^2, . . . ,uJn)- We also introduce the n-fold product measure Pn on A" 
with identical one-dimensional marginals 

p = -Y] V- 

Thus for all (J G A", P„ (cj) = For n G N and u; G A"^ the Hamiltonian for the g-state Curie- Weiss-Potts 
model is defined by 

^ n 

where (^(wj , cjfc) equals lifcjj = 0;^ and equals otherwise. The energy per particle is defined by hn{uj) = 

With this choice of /i„ and with a„ = n, the microcanonical, canonical, and Gaussian ensembles for the 
model are the probability measures on A" defined as in (12. 3t . (I2.4t . and (I2.6t . The key to our analysis of 
the Curie-Weiss-Potts model is to express /i„ in terms of the macroscopic variables 

Ln = Ln{uj) = {Ln,l{u)),Ln,2{^), ... , Ln,q{uj)), 

the ith component of which is defined by 

n 

Ln,i{uj) = -y6{LOj,9'). 

n ^-^ 

This quantity equals the relative frequency with which Wj, j G {1, . . . , n}, equals 9^. The empirical vectors 
Ln take values in the set of probability vectors 

V = G M"^ : 1/ = (z^i, . . . , fq), each z/j > 0, ^ t'j = ij^ . 

Each probability vector in V represents a possible equilibrium macrostate for the model. 

There is a one-to-one correspondence between V and the set 'P(A) of probability measures on A, G 
V corresponding to the probability measure Yl'i=i ^i^e^- The element p G V corresponding to the one- 
dimensional marginal p of the prior measures Pn is the uniform vector having equal components |. For 
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u; E A" the element of V corresponding to the empirical vector Ln{uj) is the empirical measure of the spin 
random variables t^i, 0^2, • • • , Wn- 

We denote by (•, •) the inner product on W^. Since 



q n 



i=l j = l k=l j,k=l 

it follows that the energy per particle can be rewritten as 

1 " 

hn{uj) = ^ 6{ujj,u;k) = -^{Ln{u:),Ln{uj)), 

i.e., 

hn{io) = H{Ln{uj)), whcrc H{i') = v) for v . 

H is the energy representation function for the model. 

In order to define the sets of equilibrium macrostates with respect to the three ensembles, we appeal to 
Sanov's Theorem. This states that with respect to the product measures Pn, the empirical vectors L„ satisfy 
the LDP on V with rate function given by the relative entropy R{-\p) Thm. Vlll.2.1]. For u G V this is 
defined by 

g 

i=l 

With the choices / = R{-\p), H = —\{-, •), and a„ = n, L„ satisfies the LDP on V with respect to each 
of the three ensembles with the rate functions given by ( I2.7I ). ( 12. lOt . and (12.131 . In turn, the corresponding 
sets of equilibrium macrostates are given by 

= G P : R{y\p) is minimized subject to Hiv) = u| . 



= <yiy £ V : R{i'\p) + f3H{u) is minimized | , 

and 

£f^^^ = jiy G p : R{u\p) + (3H{u) + 7[F(z^)]2 is minimized | , 

Each element u in <S", Sp, and S/^^y describes an equilibrium configuration of the model with respect to the 
corresponding ensemble in the thermodynamic limit. The ith component gives the asymptotic relative 
frequency of spins taking the value 0*. 

As in (I2.2t . the microcanonical entropy is defined by 

s{u) = - mi{R{i^\p) : ly eV, H{v) = u}. 

Since R{i'\p) < 00 for all u £ V, doms equals the range of H^iy) = —^{u,!^) on V, which is the 
closed interval [— ^, — j^]- The set of microcanonical equilibrium macrostates is nonempty precisely 
for u G dom s. For q = 3, the microcanonical entropy can be determined explicitly. For all g > 4 the 
microcanonical entropy can also be determined explicitly provided Conjecture 4.1 in |10] is valid; this 
conjecture has been verified numerically for all g G {4, 5, . . . , 10^}. The formulas for the microcanonical 
entropy are given in Theorem 4.3 in L10.1 . 
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FIG. 1: Schematic graph of s{u), showing the set F = {ue, uq) U {ur} of full ensemble equivalence, the singleton 
set P = {uq} of partial equivalence, and the set N = (uq, Ur) of nonequivalence, where ue = and = —j^- 
For u G F U P = {ue,uo] U {ur}, s{u) = s**{u); for u € N, s{u) < s**{u) and the graph of s** consists of the 
dotted line segment with slope /3c. The slope of s at U£ is oo. The quantity is discussed after Coniecture l5.ll 

We first consider the relationships between f and £p, which according to Theorem 13. H are determined 
by support properties of s. These properties can be seen in Figure 1. The quantity uq appearing in this figure 
equals [—q +3q — 3]/[2q{q — 1)] [ 10, Lem. 6.1]. Figure 1 is not the actual graph of s but a schematic graph 
that accentuates the shape of the graph of s together with the intervals of strict concavity and nonconcavity 
of this function. 

These and other details of the graph of s are also crucial in analyzing the relationships between and 
Denote dom s by [u£, Ur], where = — ^ and Ur = — These details include the observation that 
there exists wq G {uo,Ur) such that s is a concave-convex function with break point wq; i.e., the restriction 
of s to (u£, Wq) is strictly concave and the restriction of s to {wq, Ur) is strictly convex. A difficulty with 
this determination is that for certain values of q, including q = 3, the intervals of strict concavity and strict 
convexity are shallow and therefore difficult to discern. Furthermore, what seem to be strictly concave and 
strictly convex portions of this function on the scale of the entire graph might reveal themselves to be much 
less regular on a finer scale. Conjecture 15. ll sives a set of properties of s implying there exists wq € {uq, Ur) 
such that s is a concave-convex function with break point wq. In particular, this property of s guarantees 
that s has the support properties stated in the three items appearing in the next paragraph. Conjecture 15. II 
has been verified numerically for all g G {4, 5, . . . , 10^}. 

We define the sets 

F = {ui, Uq) U {ur}, P = {uq}, and N = (uq, Ur). 

Figure 1 and Theorem 13.11 then show that these sets are respectively the sets of full equivalence, partial 
equivalence, and nonequivalence of the microcanonical and canonical ensembles. The details are given in 
the next three items. In Theorem 6.2 in [10] all these conclusions concerning ensemble equivalence and 
nonequivalence are proved analytically without reference to the form of s given in Figure 1. 

1. s is strictly concave on the interval {ui,uo) and has a strictly supporting line at each u G {u£,uo) 
and at Ur. Hence for u G F = {ue, uq) U {u^} the ensembles are fully equivalent in the sense that 
there exists /3 such that = Sf^ rThm. im a)]. 

2. s is concave but not strictly concave at uq and has a nonstrictly supporting line at uq that also touches 
the graph of s over the right hand endpoint n^. Hence for u G P = {^to} the ensembles are partially 
equivalent in the sense that there exists /3 such that C f/j but / rThm. lTlT b)]. 

3. s is not concave on = {uo,Ur) and has no supporting line at any u £ N. Hence for u G the 
ensembles are nonequivalent in the sense that for all /3, f " D = [Thm. l3lT c)]. 
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The explicit calculation of the elements of and E'^ given in fioll shows different continuity properties 
of these two sets. 8p undergoes a discontinuous phase transition as /? increases through the critical inverse 
temperature (3^ = '^^gl2^ log(g — 1), the unique macrostate p for /3 < /3c bifurcating discontinuously into the 
q distinct macrostates for (3 > (3c- By contrast, £^ undergoes a continuous phase transition as u decreases 
from the maximum value Ur = — ^> the unique macrostate p for u = Ur bifurcating continuously into the q 
distinct macrostates for u < Uj.. The different continuity properties of these phase transitions shows already 
that the canonical and microcanonical ensembles are nonequivalent. 

For u in the interval N of ensemble nonequivalence, the graph of s** is affine; this is depicted by the 
dotted line segment in Figure 1. One can show that the slope of the affine portion of the graph of s** equals 
the critical inverse temperature 13c- 

This completes the discussion of the equivalence and nonequivalence of the microcanonical and canoni- 
cal ensembles. The equivalence and nonequivalence of the microcanonical and Gaussian ensembles depends 
on the relationships between the sets and Ep^.^ of corresponding equilibrium macrostates, which in turn 
are determined by support properties of the generalized microcanonical entropy s^(n) = s(n) — 7n^. As 
we just saw, for each u G N = {uQ,Ur), the microcanonical and canonical ensembles are nonequivalent. 
For u £ N we. would like to recover equivalence by replacing the canonical ensemble by an appropriate 
Gaussian ensemble. 

Unfortunately, Theorem 14.21 is not applicable. Although the first three of the hypotheses are valid, 
unfortunately s" is not bounded above on the interior of doms. Indeed, using the explicit formula for 
s given in Theorem 4.3 in |10], one verifies that lim„^(u^)- s"{u) = oo. However, we can appeal to 
Theorem 14.31 which is applicable since s is twice continuously differentiable on A^. We conclude that for 
each u G N and all sufficiently large 7 there exists a corresponding Gaussian ensemble that is equivalent to 
the microcanonical ensemble for that u. 

By using other conjectured properties of the microcanonical entropy, we are able to deduce the stronger 
result on the equivalence of the microcanonical and Gaussian ensembles stated in Theorem 15.21 As before, 
we denote dom s by [n^, Ur], where = and Ur = — ^, and write 

s'{u£) = lim s'{u) and s'{ur) = lim s'{u) 

with a similar notation for s"{ui) and s"{u.r). Using the explicit but complicated formula for s given in 
Theorem 4.2 in 1 10], the following conjecture was verified numerically for all g G {4, 5, . . . , 10^} and all 
u G {u£,Ur) of the form u = U£ + 0.02/c, where A; is a positive integer. 

Conjecture 5.1. For all q >3 the microcanonical entropy s has the following two properties. 

(a) s"'{u) > Ofor all u G {u^, Ur)- 

(b) s'{ui) = 00, < s'{ur) < 00, s"{ui) = —00, and s"{ur) = 00. 

The conjecture implies that s" is an increasing bijection of {u£,Ur) onto M. Therefore, there exists a 
unique point wq G {u£,Ur) such that s"(n) < for all u G {u£,wo), s"{'Wo) = 0, and s"{u) > for all 
u G {wq, Ur)- It follows that the restriction of s to [u£, wq] is strictly concave and the restriction of s to 
[t(;o, Ur] is strictly convex. These properties, which can be seen in Figure 1, are summarized by saying that 
s is a concave-convex function with break point wq. 

The interval N = (uq, Ur) exhibited in Figure 1 contains all energy values u for which there exists no 
canonical ensemble that is equivalent with the microcanonical ensemble. Assuming the truth of Conjecture 
15.11 we now show that for each u G N there exists 7 > and an associated Gaussian ensemble that is 
equivalent with the microcanonical ensemble for all v < u. In order to do this, for 7 > we bring in the 
generalized microcanonical entropy 

Sj{u) = s{u) — 7M^ 
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and note that the properties of s stated in Conjecture 15.11 are invariant under the addition of the quadratic 
—ju^. Hence, if Conjecture 15. H is valid, then satisfies the same properties as s. In particular, must 
be a concave-convex function with some break point w^, which is the unique point in {u^^Uj.) such that 
s'!^{u) < for all u G {ue,Wy), s"{wy) = 0, and sZ{u) > for all u G {w^,Ur). A straightforward 
argument, which we omit, and an appeal to Theorem l3.3l show that there exists a unique point G {ui, w^) 
having the properties listed in the next three items. These properties show that plays the same role for 
ensemble equivalence involving the Gaussian ensemble that the point uq plays for ensemble equivalence 
involving the canonical ensemble. 

1. For 7 > 0, is strictly concave on the interval {u£,Uy) and has a strictly supporting line at each 
u G u^) and at Ur- Hence for u £ = {u£, Uy) U {n^ } the ensembles are fully equivalent in 
the sense that there exists /? such that = rThm. l33l a)1. 

2. For 7 > 0, is concave but not strictly concave at and has a nonstrictly supporting line at 
that also touches the graph of s over the right hand endpoint Ur- Hence for u G P-y = {u^} the 
ensembles are partially equivalent in the sense that there exists P such that C S/s^^ but 7^ f 
[Thm.lHb)]. 

3. For 7 > 0, is not concave on the interval = (n^, Ur) and has no supporting line at any u G N. 
Hence for u G Nj the ensembles are nonequivalent in the sense that for all f3, n = [Thm. 
IHc)]. 

We now state our main result. 

Theorem 5.2. We assume that Coniecture \5.\\ is valid. Then as a function ofj > 0, Fj = (uc u-y) U {ur} 
is strictly increasing, and as 7 — > 00, i^^ | (ue, Ur]. It follows that for any u G N = (uq, Ur), there exists 
7 > such that the microcanonical ensemble and the Gaussian ensemble defined in terms of this 7 are fully 
equivalent for all v G {ue, Ur) satisfying v < u. The value of P defining the Gaussian ensemble is unique 
and is given by (3 = s'{v) — 2'yv. 

The proof of the theorem relies on the next lemma, part (a) of which uses Proposition ^3 When applied 
to Sy, this proposition states that has a strictly supporting line at a point if and only if s has a strictly 
supporting parabola at that point. Proposition 14. 1 l illustrates why one can achieve full equivalence with the 
Gaussian ensemble when full equivalence with the canonical ensemble fails. Namely, even when s does not 
have a supporting line at a point, it might have a supporting parabola at that point; in this case the supporting 
parabola can be made strictly supporting by increasing 7. The proofs of parts (b)-(d) of the next lemma 
rely on Theorem 14.31 and on the properties of the sets F^, P^, and stated in the three items appearing 
just before the last theorem. 

Lemma 5.3. We assume that Coniecture \5.\\ is valid. Then the following conclusions hold. 

(a) If for some 7 > 0, has a supporting line at a point u, then for any j > j, s^/ has a strictly 
supporting line at u. 

(b) For anyO<j<^,FjUPjC F^. 

(c) is a strictly increasing function 0/7 > and lim^_^oo ^7 = Ur- 

(d) As a function ofj > 0, F^ is strictly increasing. 

Proof, (a) Suppose that Sy has a supporting line at u with slope f3. Then by Proposition 14.11 s has a 
supporting parabola at u with parameters (/?, 7), where f3 = (3 + 2^u. As the definition (I4.lt makes clear, 
replacing 7 by any 7 > 7 makes the supporting parabola at u strictly supporting. Again by Proposition 14. II 
has a strictly supporting line at u. 



21 



(b) If n G U Py, then has a supporting line at u. Since < 7 < 7, part (a) implies that has a 
strictly supporting line at u. Hence u must be an element of F;^. 

(c) If < 7 < 7, then by part (a) of the present lemma C Fj. Since Fj = {u£, u-y) U {ur} 
and since < Ur, it follows that < Uj. Thus Uj is a strictly increasing function of 7 > 0. We now 
prove that lim^_»oo Uj = Ur- For any u G {u£, Ur), part (b) of Theorem 14. 31 states that there exists 7^ > 
such that Sy^{u) has a strictly supporting line at u. It follows that u S F^^ = {u£,Uj^) U {u^} and thus 
that u < u^^ < Ur- Since is a strictly increasing function of 7, it follows that for all 7 > 7^^, we have 
Uy > Uy^ > u. We have shown that for any u € {u£,Ur), there exists 7^ > such that for all 7 > 7^, we 
have Uj > u. This completes the proof that lim^^oo = Ur- 

(d) Since Fy = {ui, Uy) U {ur}, this follows immediately from the first property of Uy in part (c). The 
proof of the lemma is complete. ■ 

We are now ready to prove Theorem 15.21 The properties of Fy stated there follow immediately from 
Lemma 1531 Indeed, since Uy is a strictly increasing function of 7 > 0, Fy is also strictly increasing. In 
addition, since lim^^oo '"7 = Ur it follows that as 7 ^ 00, | {u^,Uj.]. Since Fy is the set of full 
ensemble equivalence, we conclude that for any u ^ N = {uQ,Ur), there exists 7 > such that the 
microcanonical ensemble and the Gaussian ensemble defined in terms of this 7 are fully equivalent for all 
V G {u£, Ur) satisfying v < u. The last statement concerning /? is a consequence of part (c) of Theorem 14. 3 1 
The proof of Theorem l5.2l is complete. 
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